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We show that various aspects of fc-automatic sequences — such as having an unbordered 
factor of length n — are both decidable and effectively enumerable. As a consequence it 
follows that many related sequences are either fc-automatic or fc-regular. These include 
many sequences previously studied in the literature, such as the recurrence function, the 
appearance function, and the repetitivity index. We also give some new characterizations 
of the class of fc-regular sequences. Many results extend to other sequences defined in 
terms of Pisot numeration systems. 



1. Introduction 

Let x = (a(n))„>o be an infinite sequence over a finite alphabet A. We write 
x[i] = a(«), and we let x[?..i + n — 1] denote the factor of length n beginning at 
position i. 

An infinite sequence x is said to be k-automatic if it is computable by a finite 
automaton taking as input the base-A: representation of n, and having a(n) as the 
output associated with the last state encountered [5] . 

For example, in Figure [TJ we see an automaton generating the Thue-Morse 
sequence t = io^i^ ••• = 011010011001- • ■ . The input is n, expressed in base 2, 
and the output is the number contained in the state last reached. 



f 
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Fig. 1. A finite automaton generating a sequence 

Honkala [23 a showed that, given an automaton, it is decidable if the sequence 
it generates is ultimately periodic. Later, Leroux |26| gave a polynomial-time algo- 
rithm for the problem. 

Recently, Allouche, Rampersad, and Shallit [2] found a different proof of 
Honkala's result using a more general technique. They showed that their technique 
suffices to show that the following properties (and many more) are decidable for 
/c-automatic sequences x: 

(a) Given a rational number r > 1, whether x is r-power-free; 

(b) Given a rational number r > 1, whether x contains infinitely many occur- 
rences of r-powers; 

(c) Given a rational number r > 1, whether x contains infinitely many distinct 
r-powers; 

(d) Given a length I, whether x avoids palindromes of length > I. 

Related results have recently been given by Halava, Harju, Karki, and Rigo [22] . 

In this paper we show that many additional properties of automatic sequences 
are decidable using the same general technique. More significantly, we also show 
that related enumeration questions on automatic sequences (such as counting the 
number of distinct factors of length n) can be solved using a similar technique, in 
an entirely effective manner. As a consequence, we recover or improve results due 
to Mosse [27]; Allouche, Baake, Cassaigne, and Damanik pQ; Currie and Saari [15] : 
Garel [21]; Fagnot [17]; and Brown, Rampersad, Shallit, and Vasiga [8]. 

Our main results about decidability are given in Section [2] and our main results 
about enumeration are given in Section [9] 

Throughout this paper, k denotes a fixed integer > 2, the symbol N denotes 
the non-negative integers {0, 1,2,.. .}, and the symbol Noo denotes the "extended" 
non- negative integers N U {oo}. 

2. Connections with logic and new decidability results 

After the publication of [2] , the third author noticed that the technique used there 
was, at its core, very similar to previous techniques developed by Biichi, Bruyere, 
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Michaux, Villemaire, and others, involving formal logic; see, e.g., [TU]. This was later 
independently observed by the first author, as well as by Veronique Bruyere. As it 
turns out, the properties (a)-(d) above are decidable because they are expressible 
as predicates in the first-order structure (N, +,14), where Vfe(ra) is the largest power 
of k dividing n. 

We briefly recall the technique discussed in [2] in the context of a particular 
example. Suppose we want to decide if an automatic sequence x is squarefree (con- 
tains no nonempty square factor) . Given an automaton M generating a fc-automatic 
sequence x, we create, via a series of transformations, a new automaton M 1 that 
accepts the base-fc representations of integers corresponding to the squares in x. 
For example, M' could accept those integers corresponding to the starting position 
of each square, or those integers corresponding to the lengths of the squares. The 
operations we can use in constructing M' include digit-by-digit addition or subtrac- 
tion (with carry, if necessary), comparison, and lookup of the corresponding term 
in x (which comes from simulation of M). Nondeterminism can be used to imple- 
ment "3" , and "V" can be implemented by nondeterminism combined with suitable 
negations. 

Ultimately, then, deciding if x is squarefree corresponds to verifying that 
L(M') = for the M 1 we construct. Deciding whether x contains only finitely many 
square occurrences corresponds to verifying that L(M') is finite. Both can easily 
be done by the standard methods for automata, using depth-first or breadth-hrst 
search on the underlying state diagram of the automaton. 

In this paper, we always assume that numbers are encoded in base k using the 
digits in = {0, 1, . . . , k — 1}, The canonical encoding of n is the one with no 
leading zeroes and is denoted {n)k- Similarly, if w = a% ■ ■ ■ a n £SJ, then by [w]k we 
mean X)i<i< n a ik n _ \ the integer that w represents. Often we will deal with reversed 
representations, where the least significant digit appears first. For example, in the 
reversed representation, 13 is represented in base 2 by the word 1011. 

Sometimes we will need to encode pairs, triples, or r-tuples of integers. We 
handle these by first padding the reversed representation of the smaller integer 
with trailing zeroes, and then coding the r-tuple as a word over E£. For example, 
the pair (20, 13) could be represented in base-2 as 

[0,1] [0,0] [1,1] [0,1] [1,0], 

where the first components spell out 00101 and the second components spell out 
10110. Of course, there are other possible representations, such as 

[0,1] [0,0] [1,1] [0,1] [1,0] [0,0], 

which correspond to non-canonical representations having trailing zeroes. In general, 
we permit these. 

Thus, the main idea of [2] can be restated as follows: 

Theorem 1. If we can express a property of a k- automatic sequence x using quan- 
tifiers, logical operations, integer variables, the operations of addition, subtraction, 
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indexing into x, and comparison of integers or elements of x, then this property is 
decidable. 

We illustrate the idea with the following new result. A word w is bordered if 
it begins and ends with the same word x with < |x| < \w\/2. (An example in 
English is ingoing, which begins and ends with ing.) Otherwise it is unbordered. 

Theorem 2. Let x = a(0)a(l)a(2) • • • be a k-automatic sequence. Then the asso- 
ciated infinite sequence b = 6(0)6(1)6(2) • • • defined by 

11, i/x has an unbordered factor of length n; 
b(n) = < 

I 0, otherwise; 

is k-automatic. 

Proof. The sequence x has an unbordered factor of length n 
iff 

3j > such that the factor of length n beginning at position j of x is unbordered 
iff 

there exists an integer j > such that for all possible lengths I with 1 < I < n/2, 
there is an integer i with < i < I such that the z'th letter in the supposed border 
of length I beginning and ending the factor of length n beginning at position j of x 
actually differs in the i'th position 

iff 

there exists an integer j > such that for all integers I with 1 < I < n/2 there 
exists an integer i with < i < I such that a(j + i) =/= a(j + n — I + i). 

To carry out this test, we first create an NFA that given the encoding of (j, I, n) 
guesses the base-fc representation of i, digit-by-digit, checks that i < I, computes 
j + i and j + n — I + i on the fly, and checks that a(j + i) ^ a{j + n — I + i) . If such an i 
is found, it accepts. We then convert this to a DFA, and interchange accepting and 
nonaccepting states. This DFA Mi accepts (j, I, n) such that there is no i, < i < I 
such that a(j + i) — a(j + n — I + i). We then use Mi as a subroutine to build an 
NFA M2 that on input (j,n) guesses I, checks that 1 < I < n/2, and calls Mi on 
the result. We convert this to a DFA and interchange accepting and nonaccepting 
states to get Ms . Finally, this AI3 is used as a subroutine to build an NFA M4 that 
on input n guesses j and calls M 3 . 

The set of such integers n then forms a fc-automatic sequence. □ 

Example 3. Consider the problem of determining for which lengths the Thue- 
Morse sequence has an unbordered factor. Currie and Saari [16 proved that if 
n ^ 1 (mod 6), then there is an unbordered factor of length n. (Also see [30] . 
Lemma 4.10 and Problem 4.1.) However, this is not a necessary condition, as 

t[39..69] = 0011010010110100110010110100101, 
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which is an unbordered factor of length 31. They left it as an open problem to give 
a complete characterization of the lengths for which t has an unbordered factor. 
Our method shows the characteristic sequence of such lengths is 2-automatic. 

Further, we conjecture that there is an unbordered factor of length n in t if and 
only if the base-2 expansion of n (starting with the most significant digit) is not of 
the form 1(01*Q)*10*1. 

In principle this could be verified, purely mechanically, by our method, but we 
have not yet done so. 

We now turn to deciding if a given automatic sequence x has infinite critical 
exponent (e.g., [24]). If a word w can be written in the form x n x' , where n > 1 is 
an integer and x' is a prefix of x, then we say it is a fractional power with exponent 
|iy|/|x|. For example, ingoing has exponent 7/4. The largest such exponent is called 
the exponent of the word. The critical exponent of x is the supremum, over all finite 
factors / of x, of the exponent of /. 

Theorem 4. The following question is decidable: given a k-automatic sequence, 
does it contain powers of arbitrarily large exponent? 

Proof, x has powers of arbitrarily high exponent 
iff 

the set of pairs 

S := {(n,j) : 3i > such that for all t with < t < n we have x[i + 1] = x[i + j + t] } 

contains pairs (n,j) with n/j arbitrarily large 
iff 

for all i > S contains a pair (n,j) with n > j ■ 2 l 
iff 

L, the set of base-fc encodings of pairs in S, contains, for each i, words ending in 

i 

'[*,0][*,0] •••[*, 0]>,0] 

for some 6^0, where * means any digit. 

But we can easily decide if a regular language contains words ending in arbi- 
trarily long words of this form. □ 

In a similar fashion we can show 

Theorem 5. The following question is decidable: given a k-automatic sequence x ; 
does x contain arbitrarily large unbordered factors ? 

Now we turn to questions of recurrence. 

An infinite word a = (a(n)) n >o is said to be recurrent if every factor that occurs 
at least once in a occurs infinitely often. Equivalently, a word is recurrent if and 
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only if for each occurrence of a factor of a, there exists a later occurrence of that 
factor in a. Equivalently, for every n > 0, r > 1, there exists m > n such that 
a(n + j) — a(m + j) for < j < r. 

Similarly, an infinite word a = (a(n))„>o is said to be uniformly recurrent if 
every factor that occurs at least once in a occurs infinitely often, with bounded gaps 
between consecutive occurrences. Equivalently, a word a = (a(n)) n >o is uniformly 
recurrent iff for every r > 1 there exists t > such that for every n > there exists 
m > with n < m < n + t such that a(n + i) = a[m + i) for < i < r. 

Thus we recover the following recent result of Nicolas and Pritykin [28] : 

Theorem 6. It is decidable if a k-automatic sequence is recurrent or uniformly 
recurrent. 

We now turn to questions of factors shared by two fc-automatic sequences. Fagnot 
[17] showed that it is decidable whether two such sequences x = <z(0)a(l) • • • and 
y = 6(0)6(1) • • ■ have exactly the same set of factors. This is also decidable by our 
methods, as follows: 

The sequences x = a(0)a(l) • ■ ■ and and y = 6(0)6(1) ■ ■ • have the same set of 
factors 

iff 

for alH > 0,n > 1 there exists j > such that x.[i..i + n — 1] = y[j..j + n — 1] 
iff 

for all i > 0, n > 1 there exists j > such that for all t, < t < n we have 
a(i + t) =b(j+t). 

In a similar fashion, the question of whether the set of factors of one fc-automatic 
word form a subset of the set of factors of another fc-automatic word is decidable. 

3. Enumeration 

We now turn to questions of enumeration. A typical example of the kind of question 
we are interested in is, given an automatic sequence (a(n))„>o, how many distinct 
factors are there of length n? Our goal in the remainder of this paper is to show that 
these kinds of questions often have a useful answer in terms of k-regular sequences. 
A sequence (a(n)) n >o is k-regular if the module generated by its fc-kernel, which is 
the set of all subsequences of the form 

{(a(fc e n + c))„> : e > 0, < c < fc e }, 

is finitely generated |3| 4| 5| 7j . The fc-regular sequences play the same role for 
integer-valued sequences as the fc-automatic sequences play for sequences over a 
finite alphabet. Classical examples of fc-regular sequences include polynomials in n, 
and Sfe(n), the sum of the base-fc digits of n. 

Not only does this interpretation give an explicit and efficient algorithm for 
computing the values of the sequence in question, it also gives a way to compute 
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many related quantities that, up to now, have received extended treatments in the 
literature using a wide variety of techniques. Our work therefore extends and unifies 
many results in the literature. 

In order to make our results really precise, we need several sections of prelim- 
inary definitions and results. This is what follows in Sections [4HHJ We resume the 
exposition of our results in Section [9] 

4. fc-regular sequences 

Cobham |14j showed that a sequence (s(n)) n >o is fc-automatic iff its k-kernel is 
finite. Generalizing this notion, Allouche and Shallit [31 4] introduced the notion of 
fc-regular sequence over a ring R. A sequence is k-regular if the module generated by 
its fc-kernel is finitely generated. In particular, Allouche and Shallit were interested 
in the cases of where the underlying ring is Z or Q. However, as noted in the recent 
book of Berstel and Reutenauer |7|, it makes more sense to define the fc-regular 
sequences over a semiring instead of a ring. The advantage is greater generality, but 
at the cost of giving up part of the characterization in terms of the fc-kernel. 

Example 7. To illustrate this, consider the sequence S2(n) defined to be the sum 
of the bits in the base-2 representation of n. For example, S2(27) = 4. Then S2(n) 
is 2-regular over Z, as its 2-kernel K generates a module M that is generated by 
the sequence S2{n) itself and the constant sequence 1. Indeed, we have 

K = {(s 2 (2 e n + a))„> : e > 0, < a < 2 e } 
= {(s2(n) + s 2 (a))„>o : a > 0} 
= {(s2(n) + c) n > : c > 0}, 

so that every sequence in the 2-kernel is a Z-linear combination of (s2(n))„>o and 
the constant sequence 1. Indeed, it is even true that every sequence in K is an 
N-linear combination of (s2(n)) n >o and the constant sequence 1. 

In [3], the authors show that every sequence in the /c-kernel K of a fc-regular 
sequence over Z is generated by some finite subset oi K. For example, for (s2(^))n>0; 
the 2-kernel K is generated by (s2(ri))„>o and (s2(2n + l))„>o- However, in this 
example, there is no finite subset K' C K such that every sequence in K can be 
written as an N-linear combination of the sequences in S. For every sequence in K 
is of the form S2(n) + c with c > 0. If we take some finite subset K' C K, then the 
sequences in K' of the form S2(n) + c all satisfy c < C for some finite C . We then 
cannot get S2 (n) + C + 1 as an N-linear combination of the sequences in K' (as any 
such combination would have at least two copies of S2 (n)). 

This means that to define (N, fc)-regular sequences, we have to give up one 
characterization in terms of the kernel, given in [3]. 
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5. (R, fe)-regular sequences 

In this section we give a rigorous definition of (R, fc)-regular sequences and show 
that there are a number of alternative characterizations that are equivalent. 
First, we give some definitions. 

Let S fc denote the alphabet {0, 1, ... , fc-1}. Let C k = {e} U (£ fc -{0})££ denote 
the set of canonical base-fc expansions, that is, those with no leading zero. Let R 
be a semiring. A formal series is a map h : S* — > R. For historical reasons, h(w) is 
often written as (h,w) and h itself is expressed as the formal sum X^es* w ) w - 
A formal series h taking values in a semiring R is said to be R-recognizable if 
(h, w) — ufi(w)v for all ioeS', where /i is a morphism from E* to the set of n x n 
matrices, u is a 1 x n matrix (or row vector), and v is an n X 1 matrix (or column 
vector), all with entries in R. The triple (u, fi, v) is called a linear representation of 
h. 

The reader is directed to |25| 32] and especially [7] for more information about 
recognizable series. 

We recall the following standard result about recognizable series ([7], Ex. 2.1.3, 



Lemma 8. Let R be a semiring, and let f : T,* k — > R be an R-recognizable series. 
Then the series g : E£ — > R defined by (g,w) = (f,w R ) is also R-recognizable. 

Next, we prove a somewhat technical lemma that essentially says that we can 
disregard leading 0's in the representation of a word. 

Lemma 9. Let R be a semiring, and let f : ~E* k — > R be an R-recognizable series. 
Then there exists another R-recognizable series g such that (g,0 l w) — (f,w) for all 
i > and all w £ Ck- Furthermore, there exists a linear representation (u',fj,',v f ) 
for g satisfying w'/i'(0) = v! . 

Proof. Suppose (it, /x, v) is a rank-n linear representation of /. Let /„ denote the 
n x n identity matrix. Define u',fj,',v' as follows: 



p. 42): 



1 1 



u 



[ 0--- 



u] 



M (0) 







if a = 0; 







n 




< 




H(a) 











if a ^ 0; 



v 



and set g = (u', f/,v'). 
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To see that this works, we will first prove the following two facts: 

M '(0*) 



/i(0 l ) 

In 



for i > and 



//(O^) 



(1) 



(2) 



for i > and io G (£& — {0})££. The claim ((TJ) is a trivial induction, and is omitted. 
Let's prove ^ by induction on \w\. The base case is \w\ = 1. In that case w = a, 
where a S — {0}. From the definition we have 



so, using (JTJ), we get 



H(a) 



>(0*)M(o) 0" 




XcFa) o" 


/i(a) 




fj,(a) 



as desired. For the induction step, assume the result Q holds for all w' with < 
\w'\ < \w\; we prove it for w. Write w — ax with a £ — {0}. There are two cases: 
(i) x = CP for some j > 1, and (ii) x — 0°y, where j > and y G Cfe. In case (i) we 
have, by induction, that 



l n 



and hence 

as desired. In case (ii) we have, by induction, that 

Six) = »'(Vy) = 

and again we have 

H r (w) = n'(a)fi'(x) = 



/i(a)/i(tF) 0" 




fi(ax) 




' li(w) 0" 






fjb(ax) 







tiQPy) 
/♦(») o 



fi(a)ii(Wy) 





fi(ax) 








fi(ax) 







as desired. 
Therefore 



>(o*) o " 




0" 




Xov) o" 


o I„ 











which completes the induction. 
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Now that we know that ([TJ) and hold, we have, if w = Q l for some i > 0, that 
(g,w) = u'n'(w)v' 



[ ■ • • u] 



H(0 l ) 

In 



= (/,*), 

as desired. 

Uw = l z with i > and z e (E fc - {0})S*, then 
(#,«;) = u'n'{w)v' 



[ • • • u] 



/x(0 l z) 
/u(z) 



= (/,*), 

as desired. 

Finally, note that u'// (0) = [ • ■ -0 u] 
Combining the previous two lemmas, we get 



m(o) 
o /„ 



■ • • u] 



□ 



Lemma 10. Let R be a semiring, and let f : — > R be an R-recognizable series. 
Then there exists another R-recognizable series g such that (g^wO 1 ) = (f,w) for all 
i > and all w € Cj} . Furthermore, there exists a linear representation 
for g satisfying fi'(0)v' = v' . 

We are now ready to state our equivalence theorem. This result can be viewed 
as an expanded version of [7], Prop. 1.1, p. 84. 

Theorem 11. Let {f(n)) n >o be a sequence taking values in a semiring R. The 
following are equivalent. 

(a) There exist finitely many sequences (fi(n)) n >o, . . . , (/ r ( n ))n>o such that 

(i) (f(n)) n >Q is an R-linear combination of the and 
(ii) for each i and a with 1 < i < r, and < a < k, the subsequence 
(fi(kn + a)) n >o is an R-linear combination of the (fi(n)) n >o- 

(b) There exist finitely many sequences (/i(n)) n >o, ■ ■ ■ , {fr{n)) n >o and k ma- 
trices B , B±, ... , -Bfc_i with entries in R such that if 

(hiny 



V(n) = 



\Mn), 
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then V(kn + a) = B a V(n) for < a < k. and there exists a vector z G R lxr 
such that f(n) = z ■ V(n). 

(c) There exist a matrix-valued morphism p : E£ — >• R rxr and vectors u, v with 
entries in R, such that p(0)v — v and f(n) = up(w R )v for all w G 
with [w]k = n. 

(d) There exist a matrix-valued morphism p : — > R sxs and vectors u' , v' with 
entries in R, such that u' — u'p(0) and f(n) = u'p(w)v' for all w G 
with [w]k = n. 

(e) There is an R-recognizable series d such that (d, w) — f{[w]k) for all w G 
C k . 

(f) The mapping (h,w) :— f([w]k) defines an R-recognizable series. 

(g) The mapping (h',w) :— f([w R ]k) defines an R-recognizable series. 

(h) There is an R-recognizable series p such that (p,w) — f([w R ]k) for all 
w G C R . 



Proof, (a) (b): Since each fi(kn + a) is an i?-linear combination of the fi, 

we can express this as the matrix product V(kn + a) = B a V(n). Since f(n) is an 
i?-linear combination of the fi(n), we can express this as f(n) ~ z ■ V(n) for a 
suitable vector z. 

(b) (c): In fact we can take v — V(0), p,(a) = B a for < a < k, and u = z. 
Let us prove by induction on n that V(n) = p((n) R )V(0). The base case is n = 0. 
Then {n)k = e, so p((n) R ) — I, the identity matrix, and V(0) = / • V(0). 

Now assume the result is true for all n' < n, and we prove it for n. Write 
n = kri + a for < a < k. Then by induction V{ri) = p{{n') R )V{Q). Then 
V(n) = V(kn' + a)= n{a)V{n') = n(a)»((n') R )V(0) = p((n) R )V(0). 

We have f(n) = zV(n). Furthermore, from V(kn + a) = p,{a)V{n) with k = 
0, n = 0, a = 0, we get v = p(0)v. 

Finally, if w € is such that [w]k = n, then w R = {n) R l for some i > 0. 
Because v — p(0) l v, we have up(w R )v = up((n) R )p(0) l v = up((n) R )v = f(n). 

(c) (d): Let p(i) := p(i) T , u' := v T , and v' :— u T . Then from (c) we get 
f(n) = u' p((n)k)v' . Furthermore, from v — p(0)v we get v T — v T p(0) T , and so 

u' = u'p{0). (3) 

Let w be any word such that [w]k = n. Then we can write w = l (n)k for some 
i > 0. Then u' = u' p(0) 1 from ([3]), and hence f(n) — u'p(w)v', as desired. 

(d) =>• (e): We can take d — (u',p,v'). 

(c) ==>• (f): Let d be an i?-recognizable series such that (d, w) = f([w]k) for all 
words w G Ck- Now apply Lemma [9j we obtain a new R- recognizable series h with 
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(h, Q l x) = (d, x) for allz > and all x £ C k . Let w £ ££. Then we can write w = j x, 
where x e C fc . Then (/i,k;) = (fe,(Pa:) = (d,x) = f{[x] k ) = f{px] k ) = f([w] k ). 

(f) => (g): Using Lemma [51 if ft-' is the series defined by (h',w) — (h,w R ), 
then h! is also i?-recognizable. We have (h',w) = (h,w R ) = f([w R ]k)- 

(g) (h): Trivial. 

(h) (a) : By Lemma [10] there exists an i?-recognizable series p' with linear 
representation (c',7',d') such that (p',wO l ) = (p,w) for alH > and all w e C^. 
Furthermore, 7 '(0)d' = d' . 

Define the sequences (/j(n)) n >o as follows: 

7i(»)" 



Then 



ft(n) 



fi(kn + a) 
f 2 (kn + a) 



= l'{{n) R )-d'. 



= i{{kn + a)%)-d'. 



(4) 



Jt(kn + a), 
If (a,n) + (0,0) then 

i{{kn + a) R ) ■ d> = j'(a ■ (n) R ) ■ d' 

= 1 '{a) 1 '{{n) R ).d' 
7i(n) 



i{a) 



ft(n) 



which expresses each fi(kn + a) as a linear combination of fi(n), /2(fj), • ■ • , ft{n). 
If (a, n) = (0, 0), then from (gj) we get 



/i(fcn + a) 
/ 2 (fcn + a) 

_ft{kn + a) 
Furthermore, 



7 '((0)f ) • d' = y(e) • d' = d' = 7 '(0) ■ d' = 7 » 



A(n) 
/a(n) 

/*(«) 



/(n) = (p, (n)^) = c'-7 , ((n)^)-d' = C '. 



h(n) 
h{n) 

Mn) 
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which expresses f(n) as a linear combination of fi(n), /2(H), . . . , ft{n). □ 

We can now formally define /c-regular sequences over a semiring. 

Definition 12. Suppose R is a semiring, and / : N — > R is a sequence with values 
in R. If any of the conditions (a)-(h) in Theorem QT] hold, then we say that / is 
(R, fc)-regular. 

Corollary 13. If f is a sequence such that f(n) is an R-linear combination of 
some finite subset of its k-kernel, then it is (R,k) -regular. 

Proof. Follows from Theorem QT] (a). □ 

However, unlike the case of (Z, k)- or (Q, fc)-regular sequences, the converse to 
Corollary [13] does not hold, as we have seen above in Example [7] 

6. N-recognizable series 

In this section, our semiring is R — N, the non-negative integers. We prove a char- 
acterization of N-recognizable series in terms of automata and transducers (Theo- 
rem [TH below). 

We recall the notion of nondeterministic finite automaton (NFA) : it is a 5-tuple 
M = (Q, E, 8, qo, F), where Q is a set of states, E is a finite alphabet, qo is the initial 
state, and F C Q is the set of final states, and 5 : Q x E — >• 2^ is the transition 
function, extended to Q x E* in the obvious way. A path labeled w — a\ ■ ■ ■ a n 
in an NFA is a sequence of states {po,p\, ■ ■ ■ ,p n ) such that Pi+\ € <5(j>i, a^+i) for 
< i < n. It is an accepting path if po = Qo and p n £ F. 

We will also be concerned with nondeterministic uniform finite-state transduc- 
ers. Such a transducer produces an output of the same length for every input symbol. 
Formally, such a transducer T — (Q, E, A, E, q , F), where E C QxExA'xQis the 
set of permissible transitions. A transition (qi, a,y, qj) means that if the transition 
is in state qi then on input a it has the option (nondeterministically) of outputting 
y and entering state qj . The output of T on input w is the set of all words formed 
by concatenating the outputs on a path labeled w from go to some state of F . 

Finally, given words w = a\02 • • ■ a n and x = 6162 ■ ■ • b n of the same length, but 
defined over possibly different alphabets (say, E and A, respectively), we define the 
word w x x to be the word z = [a±, 61] [02, ^2] ■ • ■ [a n , b n ] over the alphabet E x A . 
In this case, we define the projection maps tti(z) — w and ^2(2) = x. 

Theorem 14. Let f : E* — ¥ N be a formal series with (/, e) = 0. Then the following 
are equivalent. 

(a) f is N-recognizable. 

(b) There exists an NFA M = (Q,Y,,5,qo,F) such that for all w £ E* ; there 
are exactly (f,w) paths labeled w from qo to a state of F. 
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(c) There is an alphabet A and a regular language LC(Sx A)* such that 

(f,w) = \{zeL : 7nO) = iu}| 

for all words w. 

(d) There is an alphabet A and a (nondeterministic) 1-uniform finite-state 
transducer T : E* — ► A* such that (f,w) = \T(w)\ for all words w. 



Proof, (a) (b): Since / is recognizable, there is a matrix representation 

(u,fj,,v) such that (/, w) = ufj,(w)v for all w € X*. By an exercise in [31] , Ex. 

n-l 

III. 3. 3, p. 426, we can, without loss of generality, assume that u = [1 ■ • • 0] and 

n-l 

!i = [0 • • • 1] T for some n > 2. For completeness, we give the proof here: 

Given a rank-f representation (u, [A, v), we produce a new rank-(t + 2) represen- 
tation (u',fi',v') defined as follows: 



t+i 

u = [1 oo^ - 0] 

[u ■ fJ>(a)] [u ■ /i(a) • v] 

o r 

fjt(a) 




Now an easy induction on \w\ shows that, for \w\ > 1, that 



o r 



fi(w) 
L 

0---0 



fi(w) ■ v 




It follows that, for w =/= e, u'fi / (w)v / = ufi(w)v. For u; = e, we have u'n'(w)v' = 0. 
This completes the proof of the exercise. 

Now that this somewhat technical point has been handled, we turn to the idea 
of the construction. The desired interpretation is that /j,(w)i.j should count the 
number of paths labeled w from state i to state j. However, this is not sensible if a 
is a single symbol, as there is either one directed edge in the automaton from i to 
j labeled a, or none. To get around this problem, we make multiple copies of each 
state, and create a transition from i to fi(iu)ij copies of state j. 
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From the rank-n linear representation for /, namely (u, /i, v), we create an NFA 
M = (Q, S, 8, qo, F) with (f,w) paths labeled w. Let m be the maximum entry in 
all the /J,(a), a G E. Define 

Q = {[i,j] ■ l<*<n, l<i<m} 
90 = [1, 1] 

F = {[n, s] : 1 < s < m} 

*([*ii]>a) = {[ r > s ] : l<r<n, l<s<M( a )i,r}, 

where by /i(a)i, r we mean the entry in row i and column r of the matrix /x(a). 
To see that this works, let Pi,j,r{w) denote the number of paths labeled w from 
to some member of {[r, s] : 1 < s < m}. We claim that 

Pt,j,r{w) = H( w )i,r (6) 

for all r, w such that 1 < i, r < n, 1 < j < m, and u> G S*. 

The proof is by induction on | w| . The base case is = 0. In this case 



1, if i = r; 
0, otherwise. 



and the only path of length from state [i,j] is to itself, so Pij. r (w) — /j,(w)i ir . 

Now assume ([B]) holds for all \w'\ < \w\; we prove it for w. Write w = ax with 
a G S. Break the path labeled w into two pieces, one labeled a and the other labeled 
x. Then 



[*',i'l£«([*.J].o) 

= n(x)i' >r (by induction) 

[i',j'leif[i,3],o) 



= (i(ax) iir , 

which completes the induction. 

Thus u ■ (J,(w) ■ v — fi(w)i_ n = Pi^niw), as desired. 

(b) =^> (c): Given the NFA M = (Q, E, 5, q ,F), we take A = Q and define 

i = {w xx : w G £*, x G A* such that go^ is an accepting path for u>}. 

Since there are (/, w) accepting paths for w in M , and each such path begins 
with qq, the result follows. Clearly L is regular, as it can be accepted by a simple 
modification of M. 



(c) => (d): Consider a DFA M accepting L. We construct a transducer T with 
the same set of states, initial state, and set of final states as M, For each transition 
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in M of the form 5(qi, [a,b]) — qj, we define a transition in T from qi to qj with 
input a and output b. It follows that on input w, the transducer T outputs all those 
x of the same length for which w x x G L. 

(d) =>■ (a): Given such a transducer T = (Q, S, A, E 1 , qi,F), we define the matrix 

representation (u,/j,,v) for the series / as follows: if Q = {gi, . . . , q n }, then u = 
n-l 

[1 — • Qj and w has a 1 in the entries corresponding to final states of F, and 
elsewhere. Since \T(e)\ = by hypothesis, it must be that q\ g" F. Now define 
fi(a)ij to be the number of symbols b such that (qi,a,b,qj) EE. □ 

Open Problem 15. The preceding theorem would be true if we replace the 1- 
uniform finite-state transducer with any transducer where no two different paths 
labeled w give the same output. One way to ensure this is that the output labels 
form a code, and this is clearly true if the outputs are all of the same length. What 
happens if the output labels do not form a code? Is the result still true? 

Remark 16. Carpi and Maggi .13 defined the class of ^-synchronized sequences, 
a class which contains the ^-automatic sequences and is properly contained in the 
class of fc-regular sequences. A sequence (u n ) n >o is /c-synchronized if the relation 
{((n)fe, (u n )k) '■ n > 0} is a right-synchronized rational relation. Roughly speaking, 
this means that the relation is realized by a length-preserving rational transduction, 
except that we also permit the presence of "padding" symbols at the end of one or 
the other component of the input. Our transducer-based characterization, combined 
with Theorem II 1\ characterizes the more general class of k- regular sequences. 

In the usual case where |S| > 2, we can take A to be T, 1 for a suitable I, as the 
following theorem shows. 

Theorem 17. Let f : S* — ► N be a formal series with E| > 2 and (/, e) = 0. Then 
the following are equivalent. 

(a) f is ^-recognizable. 

(b) There is an integer I > 1 and a regular language LC(Ex£')* such that 

{f,w) = \{zeL : m(z)=w}\. 

for all words w. 

(c) There is an integer I > 1 and a (nondeterministic) l-uniform finite-state 
transducer T : S* — > S* such that (/, w) — \T(w)\ . 



Proof. Just like the proof of Theorem [TU The only difference is that we need to 
choose I large enough so that > |A|; then we just use elements of T, 1 instead of 
those in A. □ 
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7. Noo-recognizable series 

In this section, we consider the case where the underlying semiring is R = Noo, 
the extended non-negative integers. Roughly speaking, this extension corresponds 
to the case where a nondeterministic finite automaton or transducer is extended by 
allowing e-transitions. 

In addition to the usual interpretation for addition and multiplication of natural 
numbers, we need the following additional rules that turn Noo into a semiring: 

(i) a + oo = oo + a = oo for all a G Noo ; 

(ii) a ■ oo = oo • a = oo for all a =/= 0; 
(hi) • oo = oo • = 0. 

Matrices and vectors with entries in Noo can now be multiplied using the usual rules 
for such multiplication, in addition to the rules (i)-(iii), as needed. 

Let L be a regular language. The characteristic series of L, denoted xl, is the 
formal series such that (xl, w) = 1 if w € L and otherwise. The Hadamard product 
of two series, h h', is the term-by-term product, (h h')(w) = h(w)h'(w). The 
essential lemma is the following: 

Lemma 18. Given a recognizable formal series f over Noo, we can express it as 

f = XT® 9 + XL ■ oo, 

where g is a recognizable formal series over N and L is a regular language. Further- 
more, in the sum, we never add oo to a value other than 0. 

Proof. There are two main ideas. The hrst is that the language L = {w : (f, w) = 
oo} is regular. The second is that g can be taken to be a modification of / with all 
occurrences of oo removed. 

First, the construction of L. This is essentially that given in Salomaa and Soittola 
[32] . p. 40, Exercise 5. 

We create a new finite semiring R' = {0,p, oo} where the addition and multipli- 
cation rules are given as follows: 



+ 





P 


00 







p 


00 








P 


00 














V 


p 


P 


00 


P 





p 


00 


00 


00 


00 


DC 


oo 





00 


00 



Here 0, p, oo should be treated as formal symbols, but the intent is that the symbol p 
denotes "some positive integer" . Now we define a morphism of semirings Noo — > R' 
as follows: 

r(0) = 

r(i) = p, for < i < oo 
r(oo) = oo. 
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It is now easy to check that for all a, b e we have r(ab) — r(a) • r(b) and 
r(a + b) = r(a) + rib), where the operations on the right-hand-side are those in R' . 
We extend r to apply to vectors and matrices by applying r to each entry. 

Next, we consider the formal series /' := r o /, which takes its values in R' . It 
follows from above that f'{w) — up,(w)v, where u = t(u), v = t(i>), and ft = r a /j,. 

Now we can create a deterministic finite automaton M = (Q, E, 5, qo, F) that 
essentially computes the series /'. We do this by letting Q be the set of all possible 
lxn row vectors over i?', letting qo — it, and defining the transitions 5(q, a) = q-fi(a). 
If we define ip(t) = t ■ v, then an easy induction gives that S(qo, w) = u ■ fi.(w), and 
hence (p(S(qo, w)) = u ■ fi(w) ■ v — f'(w), as desired. 

We can now define L. Let F, the set of final states of M, be given by 



Then L = L(M). By construction, we have the following equivalences: w E L 4$ 
(f',w) = oo ^ (f,w) = oo. 

Now we turn to the construction of g. Let (it, /i, v) be a linear representation for 
/. Define a map £ : Noo — > N as follows: 



and extend £ to apply element-by-element to vectors and matrices in the obvious 
way. Let g = (u',fx', v'), where u' = /i'(a) = £(/i(a)) for each a € S, and 

v' = £,(v). The series g is created by replacing each occurrence of oo in u, fi, and 
v with 0. Then g is evidently N-recognizable, and we claim that (/, w) =/= oo 
(/, w) = (g, w). To see this, note that if (/, w) ^ oo, then in the calculation u-^,{w)-v 
any occurrences of oo that arise must eventually be multiplied by 0, yielding 0. Then 
replacing oo with has no effect, since any multiplication involving will also yield 
0. (Note that we are not claiming anything about those w for which (/, w) = oo; 
the corresponding values of g could be anything.) 

It now follows that / = xr © 9 + XL • oo. □ 

We get the following two corollaries. 

Corollary 19. // / : S* — > N is an Noo -recognizable series, then it is N- 
recognizable. 

Proof. From Lemma [TBI we have / = X^QS + XL' 00 ? where g is an N-recognizable 
formal series and L is a regular language. Since /(£*) C N and from the proof of 
Lemma [TBI we may choose L = 0. So / = g. □ 

Corollary 20. Given a recognizable formal series f over N^, with linear repre- 
sentation (u,fj,,v), there exists another linear representation (p,/3,q) such that the 
only entries equal to oo lie in p. 



F = {t e Q : t-v = oo}. 




i, if i e N; 
0, if i = oo, 
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Proof. First, by a well-known result on the Hadamard product (e.g., [33] and [7], p. 
15, the formal series g' := Xl©3 constructed in Lemma [TBI is recognizable (over N). 
So g 1 has a linear representation (u,/i, v) that contains no entries of oo. Similarly, 
since L is a regular language, the characteristic series xl has a linear representation 
(r,a, s), where the entries of r, a, and s are all either or 1. From this we can form 
a new linear representation (p, /3, q) for /, via a direct sum construction, as follows: 



Here, represents a matrix of 0's of the appropriate size. 

A routine induction shows that f3(w) contains fJ.(w) in the upper left and a(w) 



8. Characterizations of Noo-recognizable series 

Just as the N- recognizable series have a number of different interpretations in terms 
of automata and transducers, as we saw in Section [51 so do the Noo-recognizable 
series; the difference is that we need to allow e-transitions. 

Theorem 21. Let f : E* — > Nqo be a formal series. Then the following are equiv- 
alent: 

(a) f is Noo -recognizable; 

(b) There exists an NFA-e M ~ (Q, E, <5, qo, F) such that, for all w £ T,* , there 
are exactly (/, w) paths labeled w from qo to a state of F; 

(c) There is an alphabet A, a symbol B ^ E and a regular language L C ((E U 
{B}) x A)* such that 



where r is the morphism that maps a to a for a G E and B to e; 
(d) There is an alphabet A, a symbol B ^ E, and a regular language L C 
((E U {B}) x A)* such that 



(e) There is an alphabet A and a nondeterministic finite-state transducer T, 
with inputs of a single letter or e on every transition, and outputs of a single 
letter on every transition, such that (f,w) = \T(w)\. 



/3(a) 




in the lower right, from which the result follows. 



□ 



(/,w) = \{z E L : t(tti(z)) 



w}\ 



(f,w) = \{zeL : m(z) G wB*}\. 
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(a) =>• (d): By Theorem[THl we know that / = Xl7© 5 + XLi • oo, where Li C E* 
is a regular language and g is an N-recognizable series. Define g' := XX7-{ e } © 5! 
then g' is an N-recognizable series with (g', e) = 0, so we can apply the implication 

(a) =>■ (c) in Theorem [14] to g' to get an alphabet A and a regular language 
L 2 C (£ x A)* such that 

(s',™) = |{z££2 : tti(z) = w}| 

for all words w ^ e. Let a be an arbitrarily chosen, fixed symbol of A, and consider 
the language L 3 defined by 

L 3 =l (J [B,a]M U{zG ((EU{B})xA)* : tti(z) G (£i-{e})-B* and tt 2 (z) E a* 

\0<»<(/,e) / 

It is easy to see that L 3 is regular, as each term of the big union is regular. For the 
second term, we can, given a DFA for L\ — {e}, modify it by 

• changing each transition on any letter b to a transition on [b, a] 

• adding transitions out of each accepting state on [B, a] to a new final state 
q and 

• adding a self-loop labeled [B, a] from q to itself. 

Let L := Li U L 3 . Then L is regular and, by construction, (f,w) — \{z G L : 
ir 1 (z) G wB*}\. 

(d) (e): Given a DFA M for L, say M = (Q,E',5,q ,F) where £' = 
(S U {B}) x A, we create the transducer T with the same set of states, initial state, 
and set of final states as M. For each transition in M of the form 8(qi, [a, b}) = qj, 
we define a transition in T from qi to qj with input a and output b, except that if 
a = B, then we set the corresponding transition in T to have input e. Each word 
that M accepts, having first component wB* and second component y, corresponds 
to an input w of T and an output of y. The result now follows. 

(e) =>■ (c) : The construction of the previous paragraph is completely reversible, 
which shows that (e) => (d). But clearly (d) ==> (c). 

(c) =>■ (b): Let L C ((£ U {B}) x A)* be a regular language such that 
(f,w) = |{z G L : t(tti(z)) = io}|, and let M = (Q, E', 5, g , ^) be a DFA 
accepting L, where £' = (E U {B}) x A. We now create an NFA-e M' with the 
desired property, by modifying M, as follows: first, the set of states is expanded 
from Q to Q x A. Second, if M has a transition 5(qi, [a, b]) — qj with a G E, then 
M' has transitions 8([qi,c],a) = [qj, b] for all c G A. Similarly, if M has a transition 
8(qi,[B, b]) = qj, then M' has a transition 8([qi,c],e) — [qj,b] for all c G A. The 
initial state is [go, c] for some arbitrary element c G A, and the set of final states of 
M' is F x A. The formal proof that this works is essentially the proof of (a) =^ 

(b) in Theorem 1141 and is omitted. 
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(b) ==> (a): Given the NFA-e M — (Q, £, 8, qo, F), we create some associated 
matrices D a for a E £ U {e}. If the set of states Q = {qo, q%, . . . , q n -x}, then D a 
has a 1 in row i and column j iff <5 (<7i , a) = qj. 

Now any finite path labeled w — a\02 • ■ ■ a n in the transition diagram of M 
looks like 




for some bo, b\, . . . ,b n with < bi < oo. 

Let D = X)i>o ^e! this is a matrix with possibly infinite entries. Then the entry 
in row i and column j of DD ai DD a2 ■ ■ ■ DD Un D gives the number of paths from 
state qt to state qj in M. If u = [1 • • • 0] and v is the {0, l}-vector corresponding 
to the final states of M, then uDD ai DD a2 ■ ■ ■ DD an Dv is the number of accepting 
paths labeled w. 

If we now define /x(a) = DD a for a G S and u' = Du, then (u, /i, u') is a linear 
representation for /. □ 

9. Applications to enumeration 

Now that all the basic definitions and results are out of the way, we can resume 
our work on enumeration. The common theme in what follows is to show that 
some well-studied sequence is fc-regular, by combining Theorem 1141 or Theorem 1211 
(which characterize the formal series associated with counting the number of paths, 
or certain subsets of regular languages, or size of transduced sets, as N- or Noo- 
recognizable) with Theorem llll which shows the equivalence between fc-regular 
sequence and recognizable formal series. Here is a simple example: 

Theorem 22. Let E be any finite set of integers, and consider (b(n)) n >o, the se- 
quence that counts the number of reversed representations of n in base k, where 
the digits are chosen only from E, and where reversed representations with trailing 
zeroes are not allowed. Then (b(n)) n >o is (N, fe) -regular. 

Proof. We construct a transducer T having b(n) distinct outputs on input (n)^. 
On input (n)k, the transducer T guesses a possible representation w using only the 
digits of E, simultaneously "normalizes" it, digit-by-digit, to w' , and checks that 
the normalized representation is equal to the input. If it is, then w is output. There 
are some details to handle if w is shorter or longer than (n)^. If w is shorter, then 
we allow padding of w with trailing zeroes. If w is longer, then we handle this by 
permitting T to perform e-transitions on the input after it has processed all the 
symbols of (n)^ • 

Then, using Theorem [21] together with Theorem [TT] and Corollary [T9l we see 
that (6(n)) n >o is an (N, fc)-regular sequence. □ 

Example 23. Let bk(n) denote the number of representations of n in base 2, us- 
ing the digits {0,1,..., k — 1}. Then bzin) = 1, from the uniqueness of binary 
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representations, and bs(n) is the Stern-Brocot sequence evaluated at n + 1. From 
Theorem [22l we see that all these sequences are (Noo, 2)-regular. See [29] . 

We now turn to our main enumeration results. 

Theorem 24. Let x = <z(0)a(l)a(2) ... be a k-automatic sequence. Let b(n) be the 
number of distinct factors of length n in x. Then (b(n)) n >o is an (N, k)-regular 
sequence. 

Proof. To count distinct factors of length n, we count the first occurrences of each 
factor. 

The number of distinct factors of length n in x equals the number of indices i 
such that there is no index j < i with the factor of length n beginning at position 
i equal to the factor of length n beginning at position j. 

Consider the set 

S = {(n, i) : for all j with < j < i there exists an integer 
t with < t < n such that a(i + 1) ^ a(j + 1)}. 

Then, by Theorem [TJ the language S' defined to be the base- A; encoding of elements 
of S, forms a regular language. We assume without loss of generality that if one 
representation of (n, i) appears in S", then they all do, including the ones with 
leading (actually, trailing zeroes). 

We now apply a transducer to S', changing every representation of (n,i) as 
follows: we change every after the last nonzero digit in the first component to 
B. This transformation preserves the regularity of S'. Finally, we discard every 
representation that ends with [B, 0]. The effect of this is to ensure that n in the 
first component, up to ignoring the B's, has a single representation, and that each 
i corresponding to a particular n has a unique representation. Using Theorems 1141 
andfTTl we see that (6(n)) n >o is (N, fc)-regular. □ 

Remark 25. Mosse [57] proved, among other things, that a sequence that is the 
fixed point of a fc-uniform morphism has a fc-regular subword complexity function. 
With our technique, we obtain her result for these sequences and also the slightly 
more general case of fc-automatic sequence. 

Theorem 26. The sequence counting the number of palindromic factors of length 
n is (N, fc) -regular. 

Proof. The number of distinct palindromes of length n in x 
is equal to 

the number of indices i such that + n — 1] is a palindrome and x.[i..i + n — 1] 
does not appear previously in x 

is equal to 



12, 2013 18:56 WSPC/INSTRUCTION FILE dpas9 



Enumeration and Decidable Properties of Automatic Sequences 23 

the number of indices i such that x[i..i + n — 1] = x[i..i + n — l] R and for all j with 
< j < i, + n — 1] is not the same as x[j..j + n — 1] 

is equal to 

the number of indices i such that for alH, < t < n/2, a(i + t) = a(i + n— 1—t) and 
for all j with < j < i, there exists u with < u < n such that a(i + u) ^ a(j + u). 
Now apply Theorems [TH and [TTJ □ 

Remark 27. Allouche, Baake, Cassaigne, and Damanik [T], Thm. 10, proved that 
the palindrome complexity of the fixed point of a primitive fc-uniform morphism is 
fc-automatic. Our result is more general: it shows that the palindrome complexity 
of a fc-automatic sequence is ^-regular, and hence is fc-automatic iff it is bounded. 

Jean-Paul Allouche kindly informs us that our result has just been obtained 
independently by Carpi and D'Alonzo [12] . 

Example 28. Let f(n) denote the number of unbordered factors of length n of the 
Thue- Morse sequence. Here is a brief table of the values of f(n): 



n 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


f(n) 


2 


2 


4 


2 


4 


6 





4 


4 


4 


4 


12 





4 


4 


8 



By Theorems [14] and [TT] we know that / is (N, 2)-regular. Conjecturally, / is 
given by the system of recurrences 

/(4n + 1) = f(2n + 1) 

f(8n + 2) = f(2n + 1) - 8/(4n) + /(4n + 3) + 4/(8n) 

/(8n + 3) = 2/(2ra) - /(2n + 1) + 5/(4n) + /(4n + 2) - 3/(8n) 

f(8n + 4) = -4/(4n) + 2/(4n + 2) + 2/(8n) 

f(8n + 6) = 2/(2n) - f{2n + 1) + /(4n) + /(4n + 2) + /(4n + 3) - /(8n) 
/(16n) = -2/(4n) + 3/(8n) 
/(16n + 7) = -2/(2n) + /(2n + 1) - 5/(4n) + /(4n + 2) + 3/(8n) 
/(16n + 8) = -8/(4n) + 4/(4n + 2) + 4/(8n) 
/(16n + 15) = -8/(4n) + 2/(4n + 3) + 4/(8n) + f{8n + 7). 

In principle this could be verified by our method, but we have not yet done so. 

Theorem 29. Let x = a(0)a(l)a(2) • • • be a k-automatic sequence. Then the fol- 
lowing sequences are also k-automatic: 

(a) b(i) = 1 if there is a square beginning at position i; otherwise 

(b) c(i) = 1 if there is a square centered at position i; otherwise 

(c) d(i) — 1 if there is an overlap beginning at position i; otherwise 
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(d) e(i) = 1 if there is a palindrome beginning at position i; otherwise 

(e) f(i) — 1 if there is a palindrome centered at position i; otherwise 

Remark 30. Brown, Rampersad, Shallit, and Vasiga proved results (a)-(c) for the 
special case of the Thue-Morse sequence [8] . 

Theorem 31. Let x and y be k-automatic seguences. Then the following are 
(Noo, k) -regular: 

(a) the number of distinct square factors in x of length n; 

(b) the number of squares in x beginning at (centered at, ending at) position n; 

(c) the length of the longest square in x beginning at (centered at, ending at) 
position n; 

(d) the number of palindromes in x beginning at (centered at, ending at) posi- 
tion n; 

(e) the length of the longest palindrome in x beginning at ( centered at, ending 
at) position n; 

(f) the length of the longest fractional power in x beginning at (ending at) 
position n; 

(g) the number of distinct recurrent factors in x of length n; 

(h) the number of factors of length n that occur in x but not in y. 

(i) the number of factors of length n that occur in both x and y. 

Remark 32. Brown, Rampersad, Shallit, and Vasiga proved results (b)-(c) for the 
special case of the Thue-Morse sequence [8] . 

We now turn to some other measures that have received much attention. The 
recurrence function R x (n) = R(n) of an infinite word x is the smallest integer t 
such that every factor of length t of x contains as a factor every factor of length 
n. Said otherwise, it is the size of the smallest "window" one can slide along x and 
always contain all length-n factors. 

Theorem 33. 7/x is k-automatic, then (R x (n)) n >o is (Noo, k) -regular. 

Proof. We translate the predicate "R(n) > t" , as follows: 
R(n) > t 
iff 

there exists i > 0, j > such that + n — 1] appears nowhere in + 1 — 1] 

iff 

there exists i > 0, j > such that for all integers I with i < I < i + t — 1 — n we 
have x[Z. .1 + n — 1] ^ x[j..j + n — 1] 

iff 

there exists i > 0, j > 0, such that for all integers I with i<l<i + t — 1 — n there 
exists m, < m < n such that x[Z + m] ^ x[j + m]. 
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Now for any fixed n, the number of positive integers t for which R(n) > t is 
equal to R(n), Hence (R(n))„>o is (Noo, fc)-regular. □ 

Another measure is called "appearance" [5], §10.10. The appearance function 
A x (n) = A(n) is the smallest integer t such that every factor of length n appears in 
a prefix of length t of x. The following result can be proved in an analogous manner 
to the previous one. 

Theorem 34. 7/x is k-automatic, then (.A x (n)) n >o is (N, fc) -regular. 

Next, we consider a measure due to Garel [3T]. The separator length S x (n) is 
the length of the smallest factor that begins at position n of x and does not occur 
previously. 

Theorem 35. 7/x is k-automatic, then (S x (n)) n >o is (N, fc) -regular. 

Proof. The predicate u S x (n) > t" is the same as saying that for every i < t 
the word of length i beginning at position n of x occurs previously in x, which 
is the same as saying for all i, < i < t, there exists j, < j < n such that 
x[n..n + i — 1] = x[j'..j + i — 1], Now look at the pairs (n, t) satisfying this, with n 
positive. For each n there are exactly S x (n) different t's that work. □ 

Remark 36. Garel [21] proved this for the case of a fixed point of a uniform circular 
morphism; our proof works for the more general case of an arbitrary fc-automatic 
sequence. 

Carpi and D'Alonzo have introduced a measure they called repetitivity index 
|11) . This measure I x (n) is the minimum distance between two consecutive occur- 
rences of the same length-n factor in x. But a I x (n) > t" is the same as saying for 
all i,j > with i =fi j, the equality x[i..i + n — 1] = + n — 1] implies that 

j — i > t. Hence we get 

Theorem 37. 7/x is k-automatic, then its repetitivity index is (N, fc) -regular. 

For our final application, Frid and Zamboni |19j introduced the notion of "auto- 
matic permutation" . This is a permutation of N based on a fc-automatic sequence 
x, as follows: we say i < j if the infinite word x[i..oo] is lexicographically less than 
the word x[j..oo]. The permutation complexity p x (n) is the map that sends n to the 
number of distinct finite permutations of length n induced by x |18) . 

Theorem 38. The permutation complexity of a k-automatic sequence is (N, fc)- 
regular. 

Proof. First, we need to see that for fc-automatic sequences the predicate "the 
shift of x beginning at position i is lexicographically less than the shift beginning 
at position j" is fc-automatic. 
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To see this, given positions i and j, we verify that there is some index t such 
that a[i + 1] = a[j + I] for all I < t, and also that a[i + t] < a[j + t]. 

Next, we need to see that given i,j, n we need to see that the predicate "the 
length- n permutation induced by the shifts starting at position i coincides with that 
starting at j" is automatic. 

To do this we verify that for all indices I with i < l,m < i + n, the relation in 
the previous paragraph holds between i + I and i + m in the same way as it holds 
for j + I and j + m. 

In the final step, we enumerate the number of indices i for which the permutation 
at position i of length n does not match the one occurring at any previous index. 
This is just the number of distinct permutations of length n. □ 

As a corollary, we recover the result of Widmer |35j that the permutation com- 
plexity of the Thue-Morse word is (N, 2)-regular. In principle his description could 
be mechanically verified. 

10. Linear bounds 

Yet another application of our method allows us to obtain linear bounds on many 
quantities associated with automatic sequences. As a first example, we recover an 
old result of Cobham [M] on "subword" complexity. 

Theorem 39. The number of distinct factors of length n of an automatic sequence 
is 0(n). 

Proof. Let x be a fc-automatic sequence. By Theorem Q] we know that the base-A: 
encoding S' of 

S = {(n, I) : for all j < I the factor of length n starting at position j 
is different from the one starting at position 1} 

is a regular language. 

Suppose that the factor complexity of x is not 0(n). Then for every L there 
exists some pair (n, I) G S such that the length of the canonical encoding of / is 
longer than that of n by at least L digits. So in S' there is some word of the form 
(n)iJ$- L x (7)fe, where (u)k denotes the canonical encoding of u in base k and x is 
how we join separate components to form a word. 

Since the length of (I)k is very much longer than that of (n)k, we can apply the 
pumping lemma to this word, where we only pump in the portion of (I)k that is 
longer than Hence when we pump, we only add B's to the first component, and 
so its value remains unchanged. In this way by pumping wc obtain infinitely many 
values I' such that (n, /') G S. In other words, there are infinitely many distinct 
factors of length n, which is clearly absurd. The contradiction proves the result. □ 

In a similar manner we can prove that all the quantities in Theorem[3Uare either 
linearly bounded, or unbounded. 
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11. Other numeration systems 

All our results transfer, mutatis mutandis, to the setting of other numeration sys- 
tems where addition can be performed on numbers using a transducer that processes 
numbers starting with the least significant digit. 

A (generalized) numeration system is given by an increasing sequence of integers 
U = (Ui)i>o such that Uq = 1 and Cjj :— linXj-^+oo Ui+i/Ui exists and is finite. 
Then the canonical [/-representation of n (with least significant digit first), which is 
denoted by (n)u, is the unique finite word w over the alphabet £{/ = {0, . . . , Cjj — 
1} not ending with and satisfying n = X^=o _1 w W ^ anc ^ ^ e {0, • • ■ , M — 
1}, X)*=o W H Ui < Ut+i- The notion of fc-automatic sequence extends naturally to 
this context: an infinite sequence x is said to be [/-automatic if it is computable by 
a finite automaton taking as input the [/-representation (n)u of n, and having x[n] 
as the output associated with the last state encountered. 

A numeration system U is called linear if U satisfies a linear recurrence rela- 
tion over Z. A Pisot system is a linear numeration system U whose characteristic 
polynomial is the minimal polynomial of a Pisot number. Recall that a Pisot num- 
ber is an algebraic integer greater than 1, all of whose conjugates have moduli 
less than 1. For example, all integer base numeration systems and the Fibonacci 
numeration system are Pisot systems. Frougny and Solomyak |20j proved that ad- 
dition is [/-recognizable within all Pisot systems U, i.e., it can be performed by 
a finite letter-to-letter transducer reading [/-representations with least significant 
digit first. Bruyere and Hansel [9] then proved the following logical characteriza- 
tion of [/-automatic sequences for Pisot systems: a sequence is [/-automatic if and 
only if it is [/-definable, i.e., it is expressible as a predicate of (N, +, Vu), where 
Vu(n) is the smallest Ui occurring in (n)u with a nonzero coefficient. Therefore, if 
U is a Pisot system, any combinatorial property of U -automatic words that can be 
described by a predicate of (N, +, Vu) is decidable. 

The notion of (R, fc)-regular sequences extends to Pisot numeration systems: an 
infinite sequence x is said to be (R,U)-regular if the series X)n>o x [ n ]( n )y i s an 
i?-recognizable series. Thus we obtain 

Theorem 40. Let U be a Pisot numeration system and let x be any U -automatic 
word. The following sequences are U -automatic: 

(a) a(n) = 1 if there is a square beginning at (centered at, ending at) position 
« o/x, otherwise; 

(b) b(n) = 1 if there is a palindrome beginning at (centered at, ending at) 
position n o/x, otherwise; 

(c) c(n) = 1 if there is an unbordered factor beginning at ( centered at, ending 
at) position n of x, otherwise. 

The following sequences are (Noo, U) -regular: 

(a) The number of distinct square factors beginning at (centered at, ending at) 
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position n of x; 

(b ) The number of distinct palindromic factors beginning at ( centered at, ending 
at) position n o/x, otherwise; 

(c) The number of distinct unbordered factors beginning at ( centered at, ending 
at) position n o/x, otherwise. 

Berstel showed that the cardinality of the set of unnormalized Fibonacci rep- 
resentations is Fibonacci-regular [6], a result also obtained (but not published) by 
the third author about the same time. In analogy with Theorem 1221 we have 

Theorem 41. The number of unnormalized representations of n in a Pisot numer- 
ation system U is (No^U) -regular. 

12. Closing remarks 

It may be worth noting that the explicit constructions of automata we have given 
also imply bounds on the smallest example of (or counterexample to) the properties 
we consider. The bounds are essentially given by a tower of exponents whose height 
is related to the number of alternating quantifiers. For example, 

Theorem 42. Suppose x and y are k-automatic sequences generated by automata 
with at most q states. If the set of factors of x differs from the set of factors ofy, 

then there exists a factor of length at most 2 that occurs in one word but not 
the other. 

We also note that a question left open in [2], regarding the description of the 
lexicographically least word in the orbit closure of the Rudin-Shapiro sequence, was 
recently solved by Currie [15] . 

Finally, in a recent paper [34] , the third author shows that additional properties 
of automatic sequences are deducible by expanding on the techniques in this paper. 
For example, the critical exponent is computable. 
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